Automated Linking of Historical Data
نویسندگان
چکیده
The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one another or other sources the census. We evaluate different automated methods record linkage, performing a series comparisons across and against hand linking. have three main findings that lead us conclude perform well. First, number generate very low (less than 5 percent) false positive rates. trace out frontier illustrating trade-off between rate (true) match rate. Relative more conservative algorithms, humans tend link observations but at cost higher rates positives. Second, when human linkers algorithms use same variables, there relatively little disagreement them. Third, plausible analyses, coefficient estimates parameters interest are similar using linked samples based on each methods. provide code Stata commands implement various (JEL C81, C83, N01, N31, N32)
منابع مشابه
Linking Historical Data on the Web
Linked Data today available on the Web mostly represent snapshots at particular points in time. The temporal aspect of data is mostly taken into account only by adding and removing triples to keep datasets up-to-date, thus neglecting the importance to keep track of the evolution of data over time. To overcome this limitation, we introduce the LinkHisData framework to automatize the creation and...
متن کاملLinking Individuals Across Historical Sources: a Fully Automated Approach∗
Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. We suggest a fully automated method for linking historical datasets that enables researchers to create samples that minimize type I (false positives) and type...
متن کاملUsing Historical Wafermap Data for Automated Yield Analysis
To be productive and profitable in a modern semiconductor fabrication environment, large amounts of manufacturing data must be collected, analyzed, and maintained. This includes data collected from in-line and off-line wafer inspection systems and from the process equipment itself. This data is increasingly being used to design new processes, control and maintain tools, and to provide the infor...
متن کاملLinking Requirements and Design Data for Automated Functional Evaluation
This paper presents a methodology for automating the evaluation of complex hierarchical designs using black-box testing techniques. Based on an exploration model for design, this methodology generates evaluation tests using a novel semantic graph data model which captures the relationships between the related design and requirements data. Using these relationships, equivalent tests are generate...
متن کاملThe environmental-data automated track annotation (Env-DATA) system: linking animal tracks with environmental data
BACKGROUND The movement of animals is strongly influenced by external factors in their surrounding environment such as weather, habitat types, and human land use. With advances in positioning and sensor technologies, it is now possible to capture animal locations at high spatial and temporal granularities. Likewise, scientists have an increasing access to large volumes of environmental data. En...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Economic Literature
سال: 2021
ISSN: ['2328-8175', '0022-0515', '1547-1101']
DOI: https://doi.org/10.1257/jel.20201599